Maximizing data retention from the ISBSG repository

نویسندگان

  • Kefu Deng
  • Stephen G. MacDonell
چکیده

BACKGROUND: In 1997 the International Software Benchmarking Standards Group (ISBSG) began to collect data on software projects. Since then they have provided copies of their repository to researchers and practitioners, through a sequence of releases of increasing size. PROBLEM: Questions over the quality and completeness of the data in the repository have led some researchers to discard substantial proportions of the data in terms of observations, and to discount the use of some variables in the modelling of, among other things, software development effort. In some cases the details of the discarding of data has received little mention and minimal justification. METHOD: We describe the process we used in attempting to maximise the amount of data retained for modelling software development effort at the project level, based on previously completed projects that had been sized using IFPUG/NESMA function point analysis (FPA) and recorded in the repository. RESULTS: Through justified formalisation of the data set and domain-informed refinement we arrive at a final usable data set comprising 2862 (of 3024) observations across thirteen variables. CONCLUSION: a methodical approach to the preprocessing of data can help to ensure that as much data is retained for modelling as possible. Assuming that the data does reflect one or more underlying models, such retention should increase the likelihood of robust models being developed. Empirical software engineering, ISBSG repository, data formalisation, effort prediction, regression, FPA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximising data retention from the ISBSG repository

BACKGROUND: In 1997 the International Software Benchmarking Standards Group (ISBSG) began to collect data on software projects. Since then they have provided copies of their repository to researchers and practitioners, through a sequence of releases of increasing size. PROBLEM: Questions over the quality and completeness of the data in the repository have led some researchers to discard substan...

متن کامل

Performance Calculation and Benchmarking using the ISBSG Release 10 Data Repository

Traditional benchmarking models in software engineering are typically based on the concept of productivity, first defined as a single ratio of output to input, and then combined with various cost factors leading to a single value. By contrast, the concept of performance is more comprehensive than productivity, and can handle other dimensions as well, like quality. Multidimensional models, such ...

متن کامل

Empirical findings on team size and productivity in software development

The size of software project teams has been considered to be a driver of project productivity. Although there is a large literature on this, new publicly available software repositories allow us to empirically perform further research. In this paper we analyse the relationships between productivity, team size and other project variables using the International Software Benchmarking Standards Gr...

متن کامل

Potential and limitations of the ISBSG dataset in enhancing software engineering research: A mapping review

Context: The International Software Benchmarking Standards Group (ISBSG) maintains a software development repository with over 6,000 software projects. This dataset makes it possible to estimate a project‟s size, effort, duration, and cost. Objective: The aim of this study was to determine how and to what extent, ISBSG has been used by researchers from 2000, when the first papers were published...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008